卡雷利亚共和国的波罗的海语言的研究越来越重视是语料库语言学的方法和工具。自2016年以来,Karelian研究中心的语言学家,数学家和程序员一直在与VEPS和Karelian语言的开放语料库(VEPKAR)合作,这是2009年创建的VEPS Corpus的扩展。和VEP,与它们相关的多功能字典以及具有高级搜索系统的软件,使用各种文本(语言,流派等)和许多语言类别(在文本中实现了文本中的词汇和语法搜索,这要归功于Word的生成器我们之前创建的表单)。编译了3000个文本的语料库,上传和标记了文本,将文本分类为语言,方言,类型和流派的系统,并创建了单词形式的生成器。未来的计划包括开发用于使用音频记录的语音模块和使用形态分析输出的句法标记模块。由于语料库管理器和正在进行的VEPKAR的持续功能进步,并具有新的材料和文本标记,用户可以处理广泛的科学和应用任务。在创建全国性国家VEPKAR语料库时,其开发商和经理在19-21世纪努力保护和展示VEP和Karelian语言状态。
translated by 谷歌翻译
Imitation learning (IL) is a simple and powerful way to use high-quality human driving data, which can be collected at scale, to identify driving preferences and produce human-like behavior. However, policies based on imitation learning alone often fail to sufficiently account for safety and reliability concerns. In this paper, we show how imitation learning combined with reinforcement learning using simple rewards can substantially improve the safety and reliability of driving policies over those learned from imitation alone. In particular, we use a combination of imitation and reinforcement learning to train a policy on over 100k miles of urban driving data, and measure its effectiveness in test scenarios grouped by different levels of collision risk. To our knowledge, this is the first application of a combined imitation and reinforcement learning approach in autonomous driving that utilizes large amounts of real-world human driving data.
translated by 谷歌翻译
Microprocessor architects are increasingly resorting to domain-specific customization in the quest for high-performance and energy-efficiency. As the systems grow in complexity, fine-tuning architectural parameters across multiple sub-systems (e.g., datapath, memory blocks in different hierarchies, interconnects, compiler optimization, etc.) quickly results in a combinatorial explosion of design space. This makes domain-specific customization an extremely challenging task. Prior work explores using reinforcement learning (RL) and other optimization methods to automatically explore the large design space. However, these methods have traditionally relied on single-agent RL/ML formulations. It is unclear how scalable single-agent formulations are as we increase the complexity of the design space (e.g., full stack System-on-Chip design). Therefore, we propose an alternative formulation that leverages Multi-Agent RL (MARL) to tackle this problem. The key idea behind using MARL is an observation that parameters across different sub-systems are more or less independent, thus allowing a decentralized role assigned to each agent. We test this hypothesis by designing domain-specific DRAM memory controller for several workload traces. Our evaluation shows that the MARL formulation consistently outperforms single-agent RL baselines such as Proximal Policy Optimization and Soft Actor-Critic over different target objectives such as low power and latency. To this end, this work opens the pathway for new and promising research in MARL solutions for hardware architecture search.
translated by 谷歌翻译
The outbreak of the SARS-CoV-2 pandemic has put healthcare systems worldwide to their limits, resulting in increased waiting time for diagnosis and required medical assistance. With chest radiographs (CXR) being one of the most common COVID-19 diagnosis methods, many artificial intelligence tools for image-based COVID-19 detection have been developed, often trained on a small number of images from COVID-19-positive patients. Thus, the need for high-quality and well-annotated CXR image databases increased. This paper introduces POLCOVID dataset, containing chest X-ray (CXR) images of patients with COVID-19 or other-type pneumonia, and healthy individuals gathered from 15 Polish hospitals. The original radiographs are accompanied by the preprocessed images limited to the lung area and the corresponding lung masks obtained with the segmentation model. Moreover, the manually created lung masks are provided for a part of POLCOVID dataset and the other four publicly available CXR image collections. POLCOVID dataset can help in pneumonia or COVID-19 diagnosis, while the set of matched images and lung masks may serve for the development of lung segmentation solutions.
translated by 谷歌翻译
In the era of big astronomical surveys, our ability to leverage artificial intelligence algorithms simultaneously for multiple datasets will open new avenues for scientific discovery. Unfortunately, simply training a deep neural network on images from one data domain often leads to very poor performance on any other dataset. Here we develop a Universal Domain Adaptation method DeepAstroUDA, capable of performing semi-supervised domain alignment that can be applied to datasets with different types of class overlap. Extra classes can be present in any of the two datasets, and the method can even be used in the presence of unknown classes. For the first time, we demonstrate the successful use of domain adaptation on two very different observational datasets (from SDSS and DECaLS). We show that our method is capable of bridging the gap between two astronomical surveys, and also performs well for anomaly detection and clustering of unknown data in the unlabeled dataset. We apply our model to two examples of galaxy morphology classification tasks with anomaly detection: 1) classifying spiral and elliptical galaxies with detection of merging galaxies (three classes including one unknown anomaly class); 2) a more granular problem where the classes describe more detailed morphological properties of galaxies, with the detection of gravitational lenses (ten classes including one unknown anomaly class).
translated by 谷歌翻译
对于要表示为歧管上点的2D对象的图像和形状等数据结构,这是常见的。从此类数据中产生消毒的差异私有估计的机制的实用性与它与空间的基础结构和几何形状的兼容性密切相关。特别是,如最近所示,拉普拉斯机理在正面弯曲的歧管上的效用(例如肯德尔的2D形状空间)受到曲率的显着影响。关注歧管上的点样品样本的Fr \'echet平均值的问题,我们利用均值的表征为由平方距离总和组成的目标函数的最小化器,并开发了k-norm梯度机制在Riemannian歧管上,有利于产生接近目标函数零的梯度的值。对于正面弯曲的歧管的情况,我们描述了如何使用平方距离函数的梯度比Laplace机制更好地控制灵敏度,并在数值上在callosa的形状数据集上进行数值演示。还提出了机理在球体上的实用性的进一步说明以及对称正定矩阵的多种示意图。
translated by 谷歌翻译
在整个计算科学中,越来越需要利用原始计算马力的持续改进,通过对蛮力的尺度锻炼的尺度增加,以增加网状元素数量的增加。例如,如果不考虑分子水平的相互作用,就不可能对纳米多孔介质的转运进行定量预测,即从紧密的页岩地层提取至关重要的碳氢化合物。同样,惯性限制融合模拟依赖于数值扩散来模拟分子效应,例如非本地转运和混合,而无需真正考虑分子相互作用。考虑到这两个不同的应用程序,我们开发了一种新颖的功能,该功能使用主动学习方法来优化局部细尺度模拟的使用来告知粗尺度流体动力学。我们的方法解决了三个挑战:预测连续性粗尺度轨迹,以推测执行新的精细分子动力学计算,动态地更新细度计算中的粗尺度,并量化神经网络模型中的不确定性。
translated by 谷歌翻译
人工智能(AI)已成为一种变革性和多功能工具,破坏了跨科学领域的新边界。在其最有希望的应用中,AI研究是在混凝土科学和工程中开展的,它为混合设计优化和胶合系统的服务寿命预测提供了新的见解。本章旨在揭示有关混凝土材料AI现有文献的主要研究兴趣和知识结构。首先,从1990年至2020年发表的总共389篇文章是从科学网络中检索出来的。采用了科学计量学工具,例如关键字共同出现分析和文档共分析,以量化研究领域的特征和特征。这些发现在数据驱动的具体研究中引起了迫切的问题,并为混凝土社区提供了充分利用AI技术能力的未来机会。
translated by 谷歌翻译
本文介绍了有关开发的原型的研究,以服务公共政策设计的定量研究。政治学的这种子学科着重于确定参与者,之间的关系以及在健康,环境,经济和其他政策方面可以使用的工具。我们的系统旨在自动化收集法律文件,用机构语法注释它们的过程,并使用超图来分析关键实体之间的相互关系。我们的系统经过了《联合国教科文组织公约》的保护,以保护2003年的无形文化遗产,这是一份法律文件,该文件规定了确保文化遗产的国际关系的基本方面。
translated by 谷歌翻译
气候变化增加了损害电力系统可靠性并导致多次设备故障的极端天气事件(风暴,大雨,野火)的数量。实时和准确检测潜在线路故障是减轻极端天气影响并激活紧急控制的第一步。功率平衡方程非线性,极端事件中的发电不确定性增加,缺乏电网可观察性会损害传统数据驱动的失败检测方法的效率。同时,基于神经网络的现代化的机器学习方法需要大量数据来检测事故,尤其是在改变时间的环境中。本文提出了一个具有物理信息的线路故障检测器(字段),该探测器利用网格拓扑信息来减少样本和时间复杂性并提高定位准确性。最后,我们说明了与最先进的方法相比,与各种测试用例相比,我们的方法的优越性实证性能。
translated by 谷歌翻译